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COMPUTATION  OF  THE  OPTIMAL  AVERAGE  COST  POLICIES 
FOR  THE  TWO  TERMINAL  SHUTTLE 
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In  this  paper  we  consider  the  problem  of  determining  the  optima] 
average  cost  policy  for  operating  a shuttle  between  two  terminals. 

The  passengers  arrive  at  each  of  the  terminals  according  to  Poisson 
processes  and  are  transported  by  a single  carrier  with  capacity  Q < « 
operating  between  the  terminals.  Under  a fairly  general  cost  structure, 
we  show  that  the  optimal  average  cost  policy  is  monotone.  Bounds  are 
derived  for  the  optimal  control  function  and  computational  procedures 
for  determining  the  optimal  policy  for  both  the  finite  and  infinite 
capacity  cases  are  presented. 
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COMPUTATION  OF  TIIE  OPTIMAL  AVERAGE  COST  POLICY 
FOR  THE  TWO  TERMINAL  SHUTTLE 

1.  Introduction 

In  an  earlier  paper  [2]  we  have  shown  thut  a stationary  monotone 
policy  minimizes  the  expected  total  discounted  cost  of  operating  a 
finite  capacity  shuttle  between  two  terminals.  In  this  paper  we  show 
that  the  results  of  [2]  can  be  used  to  obtain  the  optimal  average  cost 
policies  for  both  the  finite  capacity  and  the  infinite  capacity  shuttles. 

In  particular,  we  present  methods  by  which  the  optimal  policy  can  actually 
be  computed.  For  the  infinite  capacity  shuttle  the  optimal  policy  can 
be  determined  by  solving  a system  of  linear  equations.  However,  for 
the  finite  capacity  case  the  problem  turns  out  to  be  much  more  complex. 

In  Section  4 we  present  an  approximate  method  for  finding  the  optimal 
policy  for  the  finite  capacity  case.  The  problem  of  finding  these 
policies  is  non-trivial  because  the  state  space  for  this  problem  is 
infinite.  Since  this  paper  is  a natural  extension  of  the  earlier  paper 
[2],  we  assume  that  the  reader  is  familiar  with  the  results  of  this 
paper.  In  the  following  we  briefly  describe  the  model  and  the  various 
assumptions. 

We  consider  a batch  service  queue  comprising  of  a carrier  with 
capacity  Q,  < »,  operating  between  two  terminals  numbered  0 and  1 
respectively.  Passengers  arrive  at  these  terminals  according  to  inde- 
pendent Poisson  processes  X(t)  and  Y(t)  with  respective  intensities 
and  The  carrier  can  be  held  at  these  terminals  until  either 

a new  passenger  arrives  and  another  decision  is  made,  or  the  carrier 
is  dispatched  and  no  decision  is  made  until  it  arrives  at  the  next 
terminal.  When  x passengers  are  present  at  the  terminal,  the  batch 
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size  of  the-  passengers  boarding  the  carrier  equals  x Aft  = min{x,Q}. 

The  costs  associated  with  operating  the  system  consists  of  a carrying 

cost  and  a holding  cost.  The  cost  of  carrying  y passengers  is  R + cy 

and  the  cost  of  holding  x passengers  is  hx  per  unit  of  time,  where 

R,  c and  h arc;  nonnegative  constants.  Without  loss  of  generality, 

we  can  assume  that  no  holding  cost  is  charged  during  the  interterminal 

travi  ! time  to  the  passengers  who  are  already  aboard  the-  carrier. 

The  interterminal  travel  times  are  assumed  to  be  independent  positive 

random  variables  with  identical  distribution  B(’)>  finite  mean  p 

2 

and  finite  second  moment  cr  . Our  objective  is  to  determine  a policy, 
that  is,  a sequence  of  decision  rules  which  minimizes  the  long  range 
expected  average  cost  of  operating  the  system.  Throughout  this  paper 
we  assume  that  2X.Qp  < Q and  2X^4  < Q.  Under  this  assumption,  it 
can  be  shown  that  the  expected  queue  length  at  each  terminal  is  finite. 
Without  loss  of  generality,  we  also  assume  that  c = 0,  because  if 
the  expected  queue  length  is  finite,  then  all  the  arriving  customers 
will  be  ultimately  dispatched  and  the  contribution  of  the  proportional 
carrying  cost  to  the  expected  average  cost  will  be  the  same  under  all 
policies.  Hence  the  policy  which  is  optimal  for  c = 0 is  also  optimal 
for  c > 0. 

There  has  been  relatively  little  published  work  in  the  area  of 
optimal  control  of  shuttles.  In  [2]  Deb  has  shown  that  for  the  dis- 
counted cost  case  the  optimal  policy  is  monotone.  He  also  suggests 
methods  for  approximating  the  optimal  discounted  policy  by  linear 
functions.  For  the  infinite  capacity  case,  Ignall  et  al.  [7]  consider 
the  problem  of  computing  the  average  cost  under  a simple  (not  necessarily 
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optimal)  operating  policy.  In  this  paper  we  show  that  the  optimal 
average1  cost  policy  has  the  following  form. 

Lot  the  state  of  the  system  be  denoted  by  the  triplet  (x,y,6) 
where  x and  y are  the  respective  numbers  of  passengers  at  the  terminal 
0 and  1 and  & is  either  0 or  1 according  to  whether  the  carrier  is 
at  the  terminal  0 or  1 respectively.  Then  there  are  monotone  decreasing 
control  functions  G0(y)  < Q.  and  G^(x)  < Q,  such  that  if  6=0 
(6  = l),  then  the  optimal  policy  is  to  dispatch  the  carrier  if  and 
only  if  x > G0(y)  (y  > G1(x)).  In  Remark  5.8  we  show  that  GQ(y)  < 

- y and  G^(x)  < m^  - x,  where  m^  and  are  positive  constants. 

In  Section  t.l  we  present  a policy  improvement  algorithm  and  a linear 
programming  formulation  for  determining  the  optimal  policy  for  the 
infinite  capacity  case.  However,  for  the  finite  capacity  case,  it  is 
- ; not  easy  to  compute  the  optimal  policy.  If  Q/2 p is  considerably 

larger  than  the  arrival  rates  X.  and  the  results  for  the  infinite 

capacity  case  can  be  used  as  an  approximation  to  the  finite  capacity 
case.  However,  for  small  Q the  approximation  is  crude.  In  Section 
k.2  we  present  a modified  policy  approximation  algorithm  for  determining 
the  approximate  optimal  policy  for  this  case. 

2.  Preliminaries 

The  notation  introduced  in  this  section  is  used  throughout  this 
paper.  We  let  x(t)  and  Y(t)  denote  the  number  of  arrivals  in  time 
t at  the  terminal  0 and  1 respectively.  Set  Z(t)  = X(t)  + Y(t), 

X = + X-^  and  let  the  random  variables  t,  and  5^  respectively 

denote  an  arbitrary  interterminal  travel  time  and  arbitrary  interarrival 
times  at  the  terminals  0 and  1.  Let  5 = min{£0,|^}  and  Va(x,y,&) 
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be  the  optimal  a-discounted  cost  when  the  system  is  in  state  (x,y,&). 
Without  loss  of  clarity  we  shall  often  suppress  the  discount  factor  a. 

For  instance,  the  a-discounted  cost  of  the  policy  v is  designated  by 
Vw(xfy,5).  We  also  let  ^ and  ^ respectively  denote  the  expected 
average  cost  of  the  policy  ir  and  the  optimal  expected  average  cost. 

It  can  be  shown  that  the  optimal  a-discounted  cost  satisfies  the  following 
functional  equation. 


(2.1)  Va(x,y,5)  = min{f(x,y,6),  g(x,y,5)}  , 

where  f is  the  cost  of  holding  the  carrier  until  the  next  arrival  and 
g is  the  cost  of  dispatching  the  carrier  immediately.  Letting 


H(x)  = hx/(a+\),  a = E(exp(-a£)}  = \/(a+\)  , 
p = P[X(|)  = 1,  Y(|)  = 0]  = \Q/\  , 

q = P[X(|)  = 0,Y(  | ) = 1]  = 1-p  = \ J\,  H(x)  = E f e"0!th(x  + Z(t))dt  , 

Jo 


and 


d.  . 
ij 


e^PtXCt)  = j,  Y(t)  = i-j]dB(t) 


y 


we  can  write 


(2.2)  f(x,y,6)  = H(x+y)  + apVa(x+l,y,6)  + aqVa(x,y+l,5) 

( g(x,y,0)  = R +H(x+y-x/\Q)  + ^ di>.Va(x-x/'\  Q+j  ,y+i-j  ,1) 

(2.5H 

lg(x,y,l)  = R +H(x+y-yAq)  + £ .Va(x+j,y-y A Q+i-j,l)  . 
The  summation  on  d. . is  taken  over  the  set  {i  > 0,  0 < j < i). 


J 
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Those  equations  arc  tho  same  as  those  derived  in  [8j.  Since  we  are 
interested  in  the  average  cost  in  this  paper,  we  can  sot  a = 0 in 
evaluating  H(x)  and  H(x)  (see  p.  l6l  of  [8]).  Then  for  a = 0 

(2.4)  H(x)  = hx/\,  H(x)  = hpx  + h /?  . 


In  [2 ] we  have  shown  that  (8.1)  is  well-defined  in  the  sense  < 00 
and  tho  stationary  policy  which  satisfies  (2.1)  is  optimal.  In  a 
fashion  similar  to  equation  (2.10)  of  [2]  we  define  the  n-period  problem 
as  follows.  Let 

(2.5)  L(x,y,5)  = R + Il(x+y-6y  f\  Q-(l-8)x  /\  Q)  . 


Sot 


(2.6) 


V°(x,y,8)  = (x+y+\/a)h/a 

Vn(x,y,8)  = minff^XjyjS),  gn(x,y,6)) 


and  for  n > 1 


(2.7)  fn+1(x,y,5)  = H(x+y)  + apvn(x+l,y,5)  + aqVU(x,y+l,S) 

(2.8)  gn+1(x,y,8)  = L(x,y,S)  + ^d.  Vn(x-(l-&)x/\Q+j,y-6yAQ+i-j,l-6)  . 


Note  that  we  have  suppressed  the  influence  of  a on  the  n-period  cost. 
The  function  f11  and  ,on  are  the  same  as  defined  in  (2.2)  and  (2.3) 
except  that  v^  in  the  right  side  of  (2.2)  and  (2.3)  has  been  replaced 
by  Vn  in  (2.7)  and  (2.8).  In  [2]  we  have  shown  that  Vn  -*  V,  f*1  -*  f 
and  gn  -*  g.  In  addition,  for  any  function  o>(x,y,z)  and  y e (0,1), 
we  define  the  difference  operator  A as  follows: 


Aw^(x,y,z)  - u)(x,y,z)  - u)(x-l+y,y-Y,z) . 


J 


(2.9) 
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As  before,  we  sometimes  suppress  7.  For  instance,  the  statement  Af  > 

Ag  means  Af^  > Ag^  and  Af^  > Ag^. 

In  Section  3 we  show  that  there  is  a discount  factor  a such  that 

the  optimal  (stationary)  a-discounted  policy  is  also  optimal  for  the 

* 

average  cost  case.  Call  this  policy  ir  . In  the  remark  following  Lemma 
* 

3.5  we  show  that  ir  exists  and  is  finite.  Furthermore,  under  the 
* 

policy  ir  , the  semi -Markov  process  is  positive  recurrent.  Therefore 
using  arguments  similar  to  those  used  in  the  Theorem  J.6  of  [8],  we 
can  show  that  / satisfies  the  following  system  of  equations. 


(2.10) 


where 


v(x,y,6)  = min(f(x,y,5),  g(x,y,6)]  , 


(2.11)  f (x,y,S)  = h?v_1(x+y)  + pv(x+l,y,8)  + qv(x,y+l,5)  - 


-1, 


and 


A 

g(x,y,0)  = r (x,y)  - \i.i  + £/  p v(x-xAQ+i>y+j>l) 


(2.12)  / 


i > 0,j  > 0 


g(x,y,l)  = r (x,y)  - + £ p,  .v(x+i,y-yAQ+j,0) 

S A i S A ■L,J 


(2.13)  / 


i > 0,j  > 0 


Pij  = P[X(T)  = if  = PjPj  * 


p,  = P[X(t)  = i],  p = P[Y(t)  = J] 


r&(x,y)  = R + hn(x  + y - &y /\Q,  - (l-5)x  Aft)  + hxu2/2  . 


Note  that  equations  (2.10)-(2.13)  can  be  obtained  directly  from  equations 
(2.l)-(2.3)  by  setting  a = 0 and  subtracting  \~V  and 
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I 


respectively  from  f and  g.  The  fact  that  optimal  average  cost  / 

satisfies  (2.lo)-(2.1j))  can  be  easily  checked.  Lot  vT  be  a stationary 

policy  with  stationary  probabilities  ^7r(x,yjS)  [note1  that  the  optimal 
* , 

policy  tt  is  of  the  sumo  type  J . Let  be  the  set  of  states  such 

that  the  carrier  is  dispatched  whenever  (x,y ,6)  r S . Then  the  average 
cost 


K = fZ/  *7r(x>y«fi),'Jx>y1 


" TT  V 

°ir 


£ 't'7r(x,y..h)h\”1(xHy)}/{\"1 
S7f 


z> 

iV 

S_ 


TT 


IT 


P V \|r  )< 

Y7f 

7 r 


Now,  suppose  ^ satisfies  (2.10)j  then  from  (2.10),  we  have 

f(x,y,6)  for  (x,y,5)  c S 
v(x,y,8)  < / ~ 

] g(x,y,5)  for  (x,y,6)  r 

Now  pre-multiplying  both  sides  of  the  above  inequalities  by  ^7J.(x,y,S) 

and  summing  over  all  (x,y,6),  we  can  show  that  </>  < ^1r-  In  particular, 

* , . 

if  tt  satisfies  (2.10),  then  the  inequalities  in  the  above  are  replaced 

i /* 

by  equalities  and  f>  - p . Also  note  tliat  the  function  v inherits 
the  behavior  of  V^,.  In  fact,  subtracting  V^(0,0,0)  from  both  sides 
of  (2.1)  and  taking  limit  as  a -*  0,  we  can  show  that 

(2.14)  V(x,y,5)  = lim  (Va(x,y,8)  - V (0,0, 0))  . 

a -*  o 

This  limit  exists  because  the  embedded  Markov  chain  for  the  optimal 
a-discounted  cost  is  positive  recurrent  for  small  a. 


5.  Average  Cost  Policy 

In  this  section  we  extend  the  results  of  discounted  cost  case  [2] 
and  show  how  one  can  obtain  bounds  on  the  control  function  and 


W 


First  we  show  tint  the  optima.l  average  cost  is  bounded.  Consider 


the  policy  0 under  which  the  carrier  is  always  dispatched.  The 


resulting  queueing  system  can  be  analyzed  as  two  separate  queues,  one 


it  each  terminal  with  mean  service  time  2p  and  respective  arrival 


Let  t and  t lie  arbitrary  interterminal  travel 


then  the  random  variable  T can  be  viewed 


as  the  service  time  for  each  of  the  queues.  Note  that  T has  the  mean 


Suppose  W is  the  expected  queu< 


'U  and  second  moment 


length  at  the  terminal  8.  Since  < Q,  the  queue  length  w 

o 6 

is  finite  and  the  average  cost  & = Rp  1 + h(W  + W ).  In  particular 


the  expected  queue  length 


then  u: 


where 


are  the  stationary 


probabilities  of  the  Markov  chain  with  the  transition  matrix 


otherwise 


P[Y(T)  = j] 


lince 


is  the  optimal  average  cost  and  hence 


We  summarize  the  main  results  of  the  discounted  cost  case  [2]  in 


Theorems  3«1  and  3 These  results  are  then  used  to  show  that  the 


optimal  average  cost  policy  and  the  optimal  discounted  cost  policy  have 


the  same  form  as  that  of  the  discounted  cost  case 


amygaw 


f 
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Theorem  3 . .1 . If  h > ccr/q,  then 

(i)  Avn  > Ag11,  A'/1  < Af°,  Afn>Agn 

(ii)  AVJ1  > 0 . 

This  is  essentially  a restatement  of  the  Theorem  5. 3 of  [2].  Note 
shat,  the  Theorem  31  (.i)  and  (ii)  also  holds  for  th->  infinite  time 
horizon  problem  (Corollary  4.1  of  [2])  and  hence 

(3.2)  AV  > A g,  Av  < Af,  Af  > Ag  and  A V > o . 

Furthermore,  in  view  of  Equation  (2.l4),  the  function  v inherits 
the  structure  of  V and  therefore 

a 

A AAA 

(3.3)  Av  > Ag,  Av  < Af,  Af  > Ag  and  Av  > 0 . 

A A A 

Also  note  that  if  v(x,y,6)  = f(x,y,8),  then  f(x,y,5)  < g(x,y, S). 

A A 

In  addition,  either  v(x+l,y,o)  = f(x+l,y,o)  or  v(x+l,y,o)  = g(x+l,y,0). 
In  the  first  case  Av()(x+l,y,o)  = Af^(x+l,y,0)  > 0 and  in  the  second 

A 

case  Avo(x+l,y,0)  > Ag(x+l,y,0)  > 0.  Similarly  one  can  show  that 
AV, (x,y+l,l)  > 0.  We  use  this  fact  in  Theorem  3-7. 

Theorem  3-2.  If  h > aR/Q,,  then  there  are  monotone  decreasing  functions 
G,  ( • ) < Q,  c ■=  0,  If  such  that  following  is  an  optimal  a-discounted 
policy.  Suppose  the  state  of  the  system  is  (x,y,S)  and  S = 0 
(5  = 1),  then  the  optimal  policy  is  to  dispatch  the  carrier  if  and 
only  if  x > G0(y)  (y  > G1(x)). 

This  is  Theorem  4.2  of  (21.  Note  that  G~  depends  on  the  discount 

o 

factor  a.  In  Lemma  3.5  we  show  that  for  some  a > 0,  Gg(»)  is 
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also  an  optimal  average  cost  policy.  Before  doing  so  we  need  the 

following  additional  lemma  and  theorem. 

* 

For  1 i > 0,  let  7 r (a)  be  the'  optimal  discounted  cost  policy. 

% 

Define  J!(0)  - (tt  (a):  0 < a < p)  to  be  the-  set  of  a-optimal  policy 

for  each  a < p. 


Lemma  .a. 3-  If  h > ap/Q,  then  !'(p)  is  finite  for  some  3 > 0. 


Proof:  Note  that  h > Cffl/Q  for  ail  0 < 0!  < Ot.  Now  let  0 be  the 
policy  under  which  the  carrier  is  never  held  at  the  terminals  and  let 
Vg  be  the  corresponding  a-discounted  cost.  Then  OV^(x,y,8)  -*  ^ as 
a - 0 and  hence  for  any  e > 0,  we  can  choose  a(x,y,&)  such  that 
for  all  0 < a < a(x,y,8),  OV0(x,y,8)  < ^ + e.  Now,  set  m = 1 + 
integer  part  of  ((/f)  + e)/h},  0 = min{a(x,y,S) : x + y = m,  & = 0,  1} 
and  3 = min(3,a).  Clearly  for  all  0 < a < 3>  x > 0,  y > 0 and 


x+y  = m 


(3.4) 


OVa(x,y,5)  <aCVe(x,y,8)  < <f>Q  + e 


Furthermore,  h(x+y)  = hm  > 4q  + £•  Now, 


f(x,y,8)  = H(x+y)  + apVa(x+l,y,&)  + aqVa(x,y+l,8) 

= Va(x,y,6)  + 1l(x+y)  + ap{Va(x+l,y,S)  -Va(x,y,8)} 

+ aq{va(x,y+l,8)  - Va(x,y,8)}  - (l-a)Va(x,y,6)  . 

Using  the  fact  that  Ay  > 0,  Vr/(x,y,8)  < Vfl(x,y,&),  aVQ  *♦  and 


h(x+y)  > + e,  wo  havi 

C7 


I 
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r(x,y,S)  > Va(x,.y,6)  + If(x+y)  - V0(x,y,8) 

= Va(x,y,6)  + ^-  (h(x+y ) - «V0(x,y,8) 

> Va(x,y,S)  + (h(x+y)  - <(>Q  - t)  > Va(x,y,8)  . 

Therefore,  using  (2.1)  we  have  Va(x,y,5)  = g(x,y,8)  and  from  Theorem 

3.2  we  obtain  Va(x',y',8)  = g(x',y',8)  for  all  x’  > x and  y'  > y. 

As  a result,  for  all  a < p,  x > 0 and  y > 0,  G (y)  < (/  + e)h_1  - y 

and  G7(x)  < (^0  + e)li  1 - x.  Also  note  that  there  arc  only  a finite 

number  of  these  functions  GQ  and  because  Gg  is  a nonnegative 

integer  valued  function  defined  for  nonnegative  integer  values  of  its 

argument.  Furthermore,  a policy  is  completely  specified  by  the  pair 
(0  ,Gi>  and  there  are  only  a finite  number  of  such  pairs.  Theiefore, 
Il(p)  is  finite. 

Lemma  3-ij--  For  each  IT  c II(p),  the  underlying  Markov  chain  is  irre- 
ducible. 

p 

Proof.  In  the  following  we  use  the  notation  x -*  y to  indicate  that 
the  probability  of  transition  from  state  x to  state  y is  at  least 
p.  Let  V be  a policy  described  in  Theorem  3.2  and  p be  the  discount 
factor  defined  in  Lemma  3«3*  Using  (2.1l),  (2.12),  Theorem  3*2,  Lemma 

3.3  and  the  fact  Gg  < Q,  we  have 

G (0)  p.  . 

(0,0,0)  * (G0(0),0,0)  and  ^ (i>J>0)  . 

Therefore,  (0,0,0)  communicates  with  all  other.  Now  suppose  the 
system  is  in  state  (x,y,8),  x < Q,  Q < y < 2Q  and  8=0,  then 
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p G0(y-Q)-x 

(x,y,l)  (x,y-Q,o)  2 > (G0(y-Q),y-Q,0)  -22*  (0,y-Q,l) 

G1(0)-y^  p 

► (o,g1(o),i)  -22*  (o,o,o)  . 

In  a similar  fashion,  we  can  show  that  for  all  x > 0 and  y > 0, 
(x,y,&)  communicates  with  (0,0,0).  Therefore,  the  embedded  Markov 
chain  is  irreducible. 

Under  each  w e n(p),  the  resulting  queueing  process  can  be 

analyzed  by  the  embedded  Markov  chain  or  semi-Markov  process.  Since 

the  embedded  Markov  chain  is  irreducible,  the  semi-Ivfarkov  process  is 

either  positive  recurrent  or  null  recurrent  (transient).  In  the  first 

case,  by  the  strong  law  of  large  numbers  for  those  processes,  the 

average  queue  length  is  finite  and  the  average  cost  ^ < °°.  In  the 

second  case  the  expected  queue  length  is  infinite  and  so  is  the  average 

cost  A.  In  either  case  by  abelian  theorem  OV  -*  d as  a -*  0, 

where  -h»  can  be  included  as  a possible  limit  (Lemma  5*1  of  [3]). 

Furthermore,  since  n(p)  is  finite,  there  is  a tt  e n(p)  and 

a sequence  of  discount  factors  (a^},  an  0,  such  that  TT  is 

a -optimal, 
n 

* 

Lemma  3-3.  If  h > 0,  then  V is  an  optimal  average  cost  policy. 


Proof.  Since  h > 0,  we  can  find  a a such  that  h > 0 ®/Q,  and  the 
hypothesis  of  Lemma  3.3  is  satisfied.  Now,  for  any  policy  TT  (not 
necessarily  in  n(fi)),  using  Theorem  1 on  p.  l8l  of  [12],  we  get 


6 > lim  av  > lim  (X  V. 


a -*  o 


n 7r(a  ) 


lim  a V 


n n -*  m tt 


= lim  OCV  - p . 


a -♦  0 TT 


This  completes  the  prooi'. 


Remark : From  (3.1)  wo  know  «(  < 00 . Since  the  embedded  Markov  chain 

TT 

is  irreducible,  therefore  the  resulting  semi-Markov  process  is  positive 
recurrent. 


Theorem  3.6.  if  h > 0,  then  there  are  monotone  decreasing  functions 

G~  < Q,,  8 = 0,  1,  such  that  if  8 = 0 (8  = l)  and  the  state  of  the 

o 

system  is  (x,y,&),  then  it  is  optimal  to  dispatch  if  and  only  if 
x > GQ(y)  (y  > cn (x)). 

* * 
Proof.  Clearly  tt  as  defined  in  Lemma  3*5  is  optimal.  But  t r is 

also  an  an -optimal  policy  and  hence  using  Theorem  3.?  we  obtain  the 

desired  result. 


The  following  theorem  sharpens  the  bounds  on  and  developed 

in  the  Lemma  3.3. 


Theorem  3.7.  For  any  8 e (o,l),  let 


(3.5) 


S0(y)  = h~V  - (y  + k-jM  + h-1^  ^ pi^Av1(i,y+l+j ,l) } 

S1(x)  = h"1^  - {x  + kQp.  + h“\0  ~ pi;.Av0(x+l+i,G,l)) 


(3.6) 


Gg( • ) = min{Sg( • ) ,Q)  . 
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Proof.  Suppose  6-0  and  x < GQ(y) > then  it  suffices  to  show  that 
v(x,y,o)  = f(x,y,0).  Suppose  the  assertion  is  false  and  v(x,y,0)  = 

/A 

g(x, y>0).  Then  from  (3.6)  and  Theorem  3.6,  we  have  x < Q-l  and 
v(x',y',0)  = p,(x',y',0)  for  all  x’  > x,  y*  > y.  Now  using  (2.10)- 
(2.12),  we  obtain 

f(x,y,0)  = h (x+y)  + pv(x+l,y,0)  + qv(x,y+l,0)  - 

= v(x,y,o)  + h\-1(x+y)  + p(v(x+l,y,o)  - v(x,y,0)) 

+ q(v(x,y+l,o)  - v(x,y,o) } - A"1 
= v(x,y,0)  + hX-1(x+y)  + q(h\u  + ^pijV(i,y+l+j,l)}-  \-1^ 

= v(x,y,o)  + h\-1{x-[h“1/5-y-X1n-h"\1  ^ pi  .v(i,y+l+j,l)  ]) 

= v(x,y,o)  + h\-1[ x - SQ(y) J < v(x,y,0)  . 

But  this  contradicts  (2.10)  and  hence  v(x,y,o)  = f(x,y,0).  Now  to 

A 

prove  the  converse,  suppose  that  v(x,y,o)  = f(x,y,0).  Then  clearly 
x < Q-l  and 

l'(x,y,0)  = h\_1(x+y)  + pv(x+l,y,0)  + qv(x,y+l,0) 

= v(x,y,0)  + h\_1(x+y)  + p{v(x+l,y,o)  - v(x,y,0)) 

+ q(v(x,y+l,o)  - v(x,y,0) } - \-1^  . 

But  from  the  remark  following  Theorem  3»1>  we  know  that  Avo(x,y+l,0)  > 
AgQ(x>y+l>0)  and  AvQ(x+l,y,0)  > 0.  Therefore 

f(x,y,0)  - v(x,y,o)  > h\-1(x+y)  + q{h^i  + Z pij  V(i,y+1+^1^"  X.”1/ 

= h\_1[ x - SQ(y) ) . 


I 

i 

' 1 
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A. 

But  f(x,y,o)  - v(  x ,y,0)  = 0 and  henco  x < SQ(y).  This  completes 
the  proof  for  8 = 0.  For  8=1  the  proof  is  essentially  the  same 
as  in  the  above. 

Also  note  that  the  above  theorem  could  be  obtained  directly  from 
Equations  (2.l)-(2.3).  In  this  case  the  proof  is  similar  to  that  of 
Lemma 

Remark  3.8:  For  any  policy  7 r,  let  m&(i)  = 11  - (i+X^  &u),  then 

mc  > CL.  This  follows  from  the  fact  that  Av  > 0 and  hence  using  (3.5) 

o—o  — 

and  (3. 6), we  obtain 

- G&  = h"1(^lr  - /)  + I PijAv(*->‘-»1~8)  > 0 • 

The  following  lemma  is  valid  for  the  special  case  Q,  = °°. 

Lemma  5. 9«  Let  Q,  = 00 , then 

(i)  A?VQ(x,y,0)  = AVQ(x,y,0)  - AVQ(x-l,y,0) 

(ii)  A^v.^(x,y,l)  < 0 

Proof.  The  proof  is  by  induction  using  the  finite  period  problem  defined 

in  (2.6).  First  we  show  that  if  A^l’k(x,y,8)  < 0 and  A^gg(x,y,&)  = 0, 

2 k 

then  A v'^(x,y,8)  < 0.  Suppose  8 = 0,  then  from  (2.6)  we  know  that 
either  VK(x-l,y,o)  = fk(x-l,y,0),  or  Vk(x-l,y,0)  = gk(x-l,y,0). 

In  the  first  case 

AVk(x,y,0)  < Afk(x,y,0)  and  AVk(x-l,y,0)  > Afk(x-l,y,0) 


and  henco 


A‘'V*(x,y,0)  < Af^(x,y,0)  - Af£(x-l,y,o)  = A‘  f£(x,y,0)  < 0 . 


In  the  second  cast' 


AV*(x,y,o)  < Ae^(x,y,0),  AV^(x-l,y,o)  > Ag£(x-l,y,o) 


and  hence 


(5-7) 


A‘ V^(x,y,0)  < A"p^(x,y,0)  =0 


Similarly  wc  can  show  that 


(3.8) 


A2v£(x,y,l)  < 0 . 


c-  k 

Furthermore  from  (?.6)-(2.8)  we  have  for  all  k > 1,  A gg(x,y,0)  = 0 
and  for  k = 1, 

(3.9)  A2f£(x,y,6)  = Af^(x,y,6)  - Af^(x,y,6)  = 0 . 

We  now  show  that  A2f£(x,y,5)  < 0 for  all  k > 1.  Assume  (3*9)  to  be 
true  for  all  k < n.  Then  (3*7)  and  (3*8)  arc  true  for  all  k < n. 

Then  from  the  definition  of  f(x,y,6),  we  have 

A2fn+1(x,y,6)  = apA2Vg(x+l,y,S)  + aqA2v£(x,y+l, 6)  < 0 . 

2 n n 

Therefor*-,  A Vg(x,y,&)  < 0 for  all  n > 1.  Since  V ->  V as  n -»  00 , 

we  get  A' v"  -+tsTy  and  A"'V_(x,y,6)  < 0. 

o o o — 

Remark  3*10:  As  a consequence  of  Lemma  5«9>  we  immediately  conclude 
A Vg(x,y,8)  < 0.  Now  using  (3.5)  and  (3*6)  one  can  show  that  SQ(y-l)  ■ 
S0(y)  < 1 and  S1(x-l)  - S1(x)  < 1.  Therefore  if  v(x,y,0)  = g(x,y,o) 

( v(x,y,l)  = g(x,y,l)),  then  v(x+l,y-l,0)  = g(x+l,y-l,o)  ( v(x-l,y+l,l) 


= g(x-l,y+l,l) ) . 


rr 
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4 . Computation  of  Optimal  Average  Cost  Policy 

In  this  section  wc  develop  algorithms  for  computing  the  control 
functions  fi  and  0^.  We  treat  the  cases  Q = 00  and  Q < 00  separately. 
Our  main  tool  for  developing  these  algorithm  is  the  system  of  equations 
(2.10)-(2.13). 

1.  Case  Q,  = °°. 

Let  0 be  the  policy  described  in  Lemma  3.3;  then  using  the  dis- 
cussion at  the  beginning  of  Section  3>  the  average  cost  of  the  policy 
2 2 

= {2R  + h\(a  + |ic  ) }/2|i.  From  Remark  3-8,  we  know 

Gg(y)  <_  h-1<}>0  “ - y and  (^(x)  <_  h ^ - Aqij  - x . 


Set  M = [h~ 1<J>_  - A,  a]  and  N = [h  - A„p],  where  the  closed  brackets 

denote  the  integer  larger  than  or  equal  to  the  number  within  the  brackets. 
Then  from  (t.l),  it  follows  that  for  8=0  (6  = l)  and  x+y  > M 

A 

(x+y  > n),  v(x,y,S)  = g(x,y,8).  Furthermore,  we  assume  that  M > N, 
otherwise  we  renumber  the  terminals  accordingly.  Using  (2.10) -(2. 13) 
and  writing  5 = (1-8)  and  r = R + h\cr  /2,  the  optimal  average  </> 
satisfies  the  following  functional  equation 


-1. 


v(x,y,8)  = min{h\  1(x+y)  + pv(x+l,y,&)  + qv(x,y+l,8)  - \ V > 


(4.2) 


r — u<t>  +-  hn[x&+5y]  + £ p.p.v(i+x6, j+y6,6))  . 

i > 0,j  > 0 J 

A 

Now,  using  definition  M,  N and  g and  Theorem  3»7>  we  obtain  the 


following  equalities 
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(4.3) 


(4.4) 


v(x,y,0)  = v(M-y,y,0)  for  x+y  > M > y+1 

^ = v(0,y,0)  for  y > M 

v(x,y,l)  = v(x,N-x,l)  for  x+y  > N > x+1 

k, 

= v(x,0,l)  for  x > N 

f v(x,y,0)  =-  (y-N)hh  + v(x,N,0)  for  y > N 

= (y-N)hp  + v(M-N,N,0)  for  y > N,  x > M-N 

v(x,y,l)  = (x-M)hp  + v(M,0,l)  for  x > M . 


Using  (4.3)  and  (4.4),  we  reduce  the  infinite  state  minimization  problem 
(4.2)  into  a finite  state  problem.  Let 


00  00 


(4.3)  R(x)  = £ £■  p.p,v(x+i,j,0)  and  R(y)  = £ £ P.P,v(i,y+j,l) 

^ J rx  -4  J j a : /a  -4  t 


J=0  i=0 


i=0  j=0 


Then  by  repeated  use  of  (4.3)  and  (4.4)  and  lengtliy  algebraic  manipu- 
lation, we  have  for  x < M-l 

M-x-1  ^ M-x-j-1  M-x-1  ^ 

(4.6)  R(x)  = £ P,  £ Piv(x+i,j,0)  + Y,  p.P(M-x-j)v(M-j,j,0) 

j=0  J i=0  j=0  J 

M ^ 00  ^ ^ 

+ £ p.v(M-j,j,0)  + Y>  P^( j“M)hp  + P(M+l)v(0,M,0) 

j=M-x  J j=M+l  J 

where 


(4.7) 


And  for  x > M 


P(i)  = E Pk  and  P(i)  = ^ p. 


k=i 


k=i 


M 


(4.8)  R(x ) = ^ P1v(H-.1,j,0)  + £ P.(j-M)hji  + P(M+l)v(0,M,0) 


j=0 


j=M+l 
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¥ 


Similarly  for  y < N-l,  we  obtain 

^ N-y-1  N-y-i-1  ^ N-y-1  ^ 

(4.9)  R(y)  = p.  ^ p v(i,y+j,0)  + 7)  p P(N-y-i)v(i,N-i,l) 

1=0  j=0  3 i=0 

N M 

+ E p-V(i,N-i,l)  + 7)  vMi,0,l) 

i=N-y  i=Nfl 


+ E P-(i-M)iip  + P(M+l)v(M,0,l)  > 
i=M+l  1 


and  for  y > N 


N M « 

(4.10)  R(y)  = E p.v(i,N-i,l)  + E P ,v(i,0,l)  + E P.j(i-M)hp 
i=0  1 i=N+l  i=M+l 

+ P(M+l)v(M,0,l)  • 


Note  that  the  infinite  sums  R(x)  and  R(y)  depend  on  v(i,j,0), 
i+j  £ M ; v(i,j,l),  i+j  < N and  v(i,0,l)  for  i = N+l,  N+2,  ...  , M. 
Furthermore,  the  value  of  R(x)  and  R(y)  are  respectively  independent 
of  x and  y for  x > M and  y > N.  We  can  also  express  R(y)  in 
terms  of  R(x)  as  follows.  First  note  that 

P M-l 

(4.11)  R(y)  = r + hp  \ - \d  + E PiR(i)  + P(M)R(M)  for  y >_  N. 


Then  using  (4.9),  we  have 


N-y-1  N-y-i-1  N-y-1 

(4.12)  R(y)  = E p.  E piv(i,y+j,0)  + E PiP(N-y-i)v(i,N-i,l) 
i=0  1 j=0  3 i=0 


+ E Pjr  + ^hi  " ^6  + R(i) ) . 
i=N-y 
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r 


i 

In  (4.11)  and  (4.12)  R(y)  depend  on  v(i,j,l),  i+,i  < N,  a reduction 
of  (M-N)  variables  from  (4.9)  and  (4 . 10 ) . The  minimization  problem 
(4.2)  is  then  clearly  equivalent  to  the  following  linear  programming 
problem 

Maximize  $ 

subject  to 

(4.13)  v(x,y,6)  - pv(x+l,y,8)  - qv(x,y+l,S)  + \ 1/>  < hX_1(x+y) 

for  8 e {0,l},  x+y  < MB  + NB  - 1 

(4.14)  v(x,y,8)  - £ P.p.v(i+xS,  j+yB)  +v  < r + hn(x8+y6) 

1 J 

for  8 e (o,l),  x+y  < MB  + NB  - 1 


(4.15)  v(x,y,8)  - £p  p v(i+x8,j+y8)  + = r + hp(x8+y6) 

for  8 e (0,1),  x+y  = MB  + NB 


v(x,y,8)  > 0 . 

If  we  replace  £ p. P .v(i+xB, j+yB)  in  (4.l4)  and  (4.15)  by  right-hand 
side  of  (4.6)-(4.10),  the  above  linear  program  has  at  most  M(M+l)  + 

N(N+l)  + (N-N)  variables.  The  fact  that  this  linear  program  (4.13)-(4.15) 
indeed  gives  the  optimal  average  cost  follows  from  remarks  at  the  end 
of  Equations  (2.10)-(2.13).  The  optimal  policy  GQ(y)  (G^x))  is  given 
by  the  smallest  value  of  x(y)  for  given  values  of  y(x),  for  which 
■'quality  holds  in  the  above  inequality  (4.14)  and  (4.15).  Since  M 
and  N depend  on  da>  the  number  of  variables  in  (4.6)-(4.10)  can 
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bo  reduced  by  choosing  a policy  tt  such  that  < /<()  and  using  ^ 

in  (4.1)  for  computing  M and  N.  However,  t may  be  difficult  to  find 

a policy  tt  such  that  J>  < Moreover,  the  computation  of  «5 

V a V 

for  a given  policy  TT  is  itself  a non-trivial  task.  In  the  following 
we  describe  a policy  improvement  algorithm,  which  takes  advantage  of 
the  structure  of  v(x,y,8)  and  the  value  of  £ at  each  iteration  of 
the  algorithm.  Let  Ctt( i ) ) be  a sequence  of  policies  and  let 
be  the  set  of  states  under  the  policy  7r(i)  such  that  the  carrier  is 
dispatched  whenever  (x,y,5)  e In  addition,  we  assume  that  for 

all  x > 0 and  y > 0,  such  that  x+y  > MS+N8,  (x,y,S)  e 

* 

Clearly  the  sequence  {tt( i ) } contains  the  optimal  policy  7T  . Then 
the  policy  improvement  algorithm  consists  of  the  following  steps. 

1.  Set  i = 0,  = {(x,y):  x>0,y>0}  .compute  R(x)  and  R(y) 

2.  Solve  (4.l6)  for  and  v(x,y,6) 

( V(x,y,8)  + \\  = h\~1(x+y)  + pv(x+l,y,6)  + qv(x,y+l,8) 


for  (x,y,5)  e ST 


(4.16) 


v(x,y,0)  + = r + hpy  + R(y)  for  (x,y,o)  e 

v(x,y,l)  + 4^  = r + hpx  + R(x)  for  (x,y,0)  e Vi) 


where  is  the  compliment  of  the  set  S^. 


Note  that  the  system  of  equations  (4.16)  is  over-determined  and  hence 
we  can  choose  one  of  the  variable  V arbitrarily  (set  v(0,0,0)  = 0). 
The  total  number  of  unknown  variables  is  i{M(M+l)  + N(N+l)J  which  is 
half  of  the  number  of  variables  in  the  linear  programming  formulation. 
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3.  Compute  , M,  and  N using  the  following  relationship. 
(4.17)  M = [h  X u],  N = [h  ^ -X^a]  and  S^=  {(x,y,  ):  x+y>M6  + 


4 . For  all  (x,y,8)  e S , compute  the  test  quantities  tQ  and 
t^  using 


fp  X[r  + hpy  + R(y)  - v(x,y,6)]  for  8=0 
(4.1b)  t =/ 

Jp  fr  + hpx  + R(x)  - v(x,y,8)  ] for  8 = 1 


(4.19)  t1  = \[h\_1(x+y)  + PV(x+l,y,6)  + qv(x,y+l,8)  - v(x,y,8)] 


Tf  t^  < t,  for  some  (x,y,8),  then  set  (x,y ,8) . 


5.  If  S„ 


then  stco  and  the  optimal  action  is  determined  by  S 


Otherwise,  set  i-i+1,  , recompute  R(x)  and  R(y)  for  the 

new  values  of  M and  N.  Then  go  to  step  2 of  this  algorithm. 


Note  that  at  each  iteration  the  value  of  M and  N decreases  and 

hence  the  number  of  unknown  variables  in  (4.l6)  reduces.  Furthermore, 

in  Step  4 of  the  algorithm,  we  have  to  compute  only  one  of  the  test 

quantities  tQ  and  t^,  because  if  (x,y,8)  e S. ^ ^ , then  tQ  = 

otherwise  t,  = 4. 

1 7 r 

Also  considerable  computational  savings  can  be  achieved  by  taking 

A 

advantage  of  the  structure  of  V.  For  instance,  if  Av  > Ag,  then 
one  can  shew  that  if  < t for  some  (x,y,5),  then  for  all  x'  > x 

and  y*  > y,  (^J (x' ,y' ,8) . It  seems  that  for  the  policy  improve- 

A 

ment  algorithm  the  assertion  that  Av  > Ag  at  each  iteration  is  true. 


However,  we  are  unable  to  prove  this. 
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2.  Case  Q < 

The  computation  of  the  optimal  policy  for  the  case  Q . < » is 
more  complex  than  the  infinite  capacity  case.  When  Q < 00 , it  is  not 
possible  to  obtain  equalities  of  the  form  (4.3)  and  (4.4)  and  hence 
the  system  of  Equations  (2.10)-(2.13)  cannot  be  expressed  as  a system 
finite  number  of  equations.  However,  one  could  use  the  solution  of 
the  case  Q = 00  as  an  approximation.  Another  alternative  would  be  to 
allow  finite  waiting  space  at  each  terminal.  In  the  following  we 
suggest  yet  another  approximate  method. 

Let  6 bo  the  policy  described  at  the  beginning  of  Section  3, 
then  the  average  cost  of  the  po3.icy  6 is 

00 

(U.20)  = mi"1  + h\(a2  + p2)/2u  + h £ (Q-i)(irJ  + ir£)  . 

i=Q 

Sot 


^ M = [h~14>e  - X^u]  and  N =[h  14>0  - XQu]  . 

From  the  Remark  3*8,  it  follows  that  if  x+y  > MS  + NS,  then  for  all 

A 

x'  > x and  y'  > y,  v(x,y,5)  = g(x,y,S).  Now  using  Theorem  3*7> 

Remark  3*8,  (2.10)-(2.13)  and  writing  r = R + h\a‘/2,  the  optimal 
average  ^ satisfies  the  following  system  of  equations. 

v(x,y,8)  = min{hk_1(x+y)+pv(x+l,y,S)+qv(x,y+l,8)-X  V > 

r-p/  +hp ( x +y-8y/' \Q-8x A Q ) + £p ..  ,v(x-SxAw,+i,y-SyAQ+j,6) } 

f x+y  < M-l,  x < Q-l  if  8=0 
(4.22)  for  all  < 

| x+y  < N-l,  y < Q-l  if  8 = 1 

v(x,y,5)  = r-p<$+hp(x+y-8y  Q-8x  Q)  +^}p.  .v(x-8x  Q+i,y-5y  Q+j,5) 


i 


otherwise  . 
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Note  tliat  (4.2?)  has  infinitely  many  variables.  We  shall  now  reduce 
t.ho  number  of  variables  by  approximating  v(x,y,  6)  for  large  values 
of  x ami  y.  Suppose  6-0,  x » Q and  y » Q;  then  using  Theorem 
j5.7,  we  know  that  the  carrier  will  not  wait  at  any  terminal  until  the 
number  of  waiting  passengers  at  each  terminal  is  less  than  Q.  Let  the 
random  variables  T()  and  be  the-  instants  at  which  the  number 

of  passengers  at  terminals  0 and  1 respectively  fall  below  Q for 
the  first  time  given  that  initially  the  system  is  in  state  (x,y,0). 

Then  the  policy  0 is  optimal  for  the  duration  min(T(),T^)  and  under 
this  policy 

fv(x,y,0)  - v(x-l,y,0)  = hE(TQ) 

(4.23)  ^ 

|_v(x,y,0)  - v(x,y-l,0)  = hE^)  . 

Now,  suppose  n^  and  n^  are  the  respective  number  of  dispatches  in 
time  Tq  and  T^.  Since  Q passengers  are  removed  from  each  terminal 
at  each  dispatch,  hence 


(4.24) 


( -QE{nQ)  + 4qE(T0}  + x ~ Q 
[>-QE(n1J  + + y ~ Q • 


Also  note  that 


(4.25) 


E(Tq)  = 2pE{n0) 

E CT1)  = p + SpEfn.^  . 


Then  from  (4.24)  and  (4.25)  we  have 


V.  . 


(4.86) 


fE(n0)  ^ (x“Q)/((i-'4iX.0) 


~ (y  + - ti)/(«i-^4\1) 


Using  (4.85),  (4.86)  and  (4.25),  wo  obtain 


(4.27) 


whore 


(4.28) 


| v(x,y,0)  - v(x-l,y,0)  ~ hp  (x-Q) 


v(x,y,o)  - v(x,y-l,o)  2 hpx(y-Q/P) 


p&  = ?h/(Q-2h\6)  for  5 - 0,  1 


In  a similar  fashion,  wo  can  show  that 


(4.29) 


f Av  (x,y,l)  ^ hp (x-Q/2) 


AVT(x,y,l)  ^ hpi(y-Q,) 


Suppose  D » Q,  then  using  the  approximation  (4.27),  (4,28),  (4.7) 
and  (2.15),  we  have 

RQ(x,y)  = EPi  Ep.v(x+i,y+j,0) 

D-x-1  D-y-1  ^ 

= E P-s  E P-jV(x+i,y+j,o) 
i-0  j=0  J 


(4.50) 


hpQP(D-y)  E PD_x+ii*D"Q+(1+1)/2} 


+ hp  P(n-x)  E Pn  v+1.ifD+(J+i-Q)/2) 
j=o  y J 


+ P(D-y)P(D-x)v(D,D,0) 
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arid 


^(xjy)  = £Pi  £pjv(x^»y+J»1) 


D-x-1  D-y-1 

e ^ r 

i=0  x j=0 


= £ Pi  L P,v(x+i,y+j,l) 

; s\  J 


(4.31) 


+ hp0P(D-y)  £ Pjj^  +i i f D+(  i +1-Q ) /2 } 


+•  hp.P(D-x)  £ p j{D-Q+(i+l)/2} 
j=0  u~y  J 

+ P(D-y)P(D-x)v(D,D,l)  . 


Note  that  both  Rfl  and  R^  depend  on  v(x,y,6),  x < D and  y < D. 

As  before,  the  first  term  in  R_  vanishes  for  x = D and  y = D. 

o 

Now,  using  (4.30)  and  (4.31),  we  can  rewrite  the  system  of  equations 
(4.22)  as  follows.  Get 


(4.32) 


Then 


f Sw  = ((x,y,0) : x+y  < M-l,  x < Q-l) U((x,y,l) : x+y  < N-l,  y < Q-l) 
\ = ((x,y,5):  x < D,  y < U and  (x,y,6)  / S^}  . 


^ v(x,y,&)  = min(hX~1(xty)  - + pv(x+l,y,6)  + qv(x,y+l,5)  , 


I 

(M3)  { 


r - + hji  ( x +y-  by  A Q- Bx  Aq  ) + Rg(x-bx  \Q,y-byAQ) ) 

for  (x,y,b)  e 

v(x,y,b)  = r - + hji(x+y-6yAQ-&xAQ)  + Rg(x-bxAQ>y-8yAQ) 

for  (x,y,6)  e . 


m- 
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Note  that  the  minimization  problem  (4.33)  has  d‘"  + 1 unknown  variables. 
This  problem  can  be  solved  by  the  linear  programming  method  previously 
outlined  for  the  case  Q = °°.  We  need  only  to  replace  the  system  of 
linear  inequalities  (4.13)-(4.13)  with  the  inequalities  (4.33)»  We 
could  also  use  the  policy  improvement  algorithm  to  solve  (4.33)*  The 
policy  improvement  algorithm  for  this  case  is  similar  to  that  of  the 
infinite  capacity  case.  In  the  following  we  simply  state  the  changes 
necessary  to  accomplish  this. 


Step  2;  Replace  (4.l6)  with  the  following  system  of  equality: 


/ 


(4.34)  < 

V 


Step  4; 


v(x,y,6)  = hX-1(x+y)  - + pv(x+l,y,S)  + qv(x,y+l,&) 

for  (x,y,8)  e S^ 

v(x,y,&)  = r - p^+  hp(x-ty-5y/\Q-5x/\Q)  + n^(x-bx/\Q,,y-Syf\Q) 

for  (x,y,6)  e . 

In  (4.17)  compute  new  values  of  S^  and  S^  using  (4.32). 


Step  5: 

(4.33) 


Compute  t and  t^  using 

t = h-1{  r + hp  ( x +y-  8y/\  Q - 8x/\  Q ) 


+ R5(x-8x/\Q,y-6yA^)  - V(x,y,8)) 

( t = k(h\“1(x+y)  + pv(x+l,y,8)  + qv(x,y+l,&)  - v(x,y,&)}  . 


All  other  steps  remain  the  some  as  in  the  case  of  Q = °°. 

We  conclude  with  the  remark  that  both  the  linear  program  and  policy 
improvement  algorithm  for  the  case  Q < «°  will  give  approximate  results. 
However,  if  we  make  D very  large,  then  the  resulting  error  will  be 


f 

small.  On<-  can  also  solve  (4.34)  for  a sequence  of  increasing  values 
of  D and  terminate  the  algorithm  when  further  increase  in  D does 
not  change  the  optimal  policy.  Some  insight  about  the  error  in  this 
procedure  may  be  obtained  by  checking  whether  the  last  three  terms  of 
Rg(0,0)  in  (4.30)  and  (4.31)  is  small  compared  to  the  first  term  of 
R6(0,0). 


- ::q  . 
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